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STATISTICAL  IMAGE  PROCESSING  FOR  REALTIME  OPERATIONS 


ANNUAL  REPORT 


and  current  research  activities  with  this  project  are  also  described  in 


II.  STATISTICAL  IMAGE  RECOGNITION 


the  limited  number  of  learning  samples  but  also  the  finite  number  of 


quantization  levels.  Experimental  results  based  on  the  multivariate 


>.  I 
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Gaussian  probability  density  assumption  indicate  that  the  maximum 
likelihood  decision  rule  (MLDR)  and  the  nearest  neighbor  decision 
rule  (NNDR)  perform  very  much  the  same  at  moderate  sample  size,  say, 
between  100  and  400  samples.  Theoretical  results  have  demonstrated 
the  "peaking"  phenomenon  between  the  probability  of  correct  classifi¬ 
cation  and  the  dimensionality,  i.e.  the  feature  number,  for  finite 
sample  size  in  both  MLDR  and  NNDR.  For  the  NNDR  with  Gaussian  patterns, 
we  have  established  an  empirical  expression  for  the  optimum  dimensionality 


P 


4  0. 578 

opt  =  9  N 


(1) 


if  k  =  5  and  2  <_  10.  Here  k  is  the  number  of  nearest  neighbors 

used,  p  is  the  dimensionality,  and  N  is  the  sample  size. 


J 


Decision  tree  or  multistage  classifier,  if  properly  designed, 
requires  much  less  computation  time  for  desired  accuracy  in  recognition 
or  interpretation  of  images  as  compared  to  the  conventional  maximum 
likelihood  classification.  It  also  demonstrates  theoretically  the 
peaking  phenomenon  in  mean  recognition  accuracy.  Presently  a  nonpara- 
metric  approach  is  used  in  a  binary  decision  tree  design  for  detection 
of  objects  in  a  series  of  aerial  photographs.  Theoretical  error  bound 
is  also  considered  for  the  tree  classifier. 


The  effect  of  quantization  on  the  object  recognition  performance 
can  be  considered  as  follows.  At  256  levels  the  performance  is  nearly 
the  same  as  the  continuous  gray  scale  case.  In  the  limiting  case  of 
two  levels,  i.e.,  the  binary  picture,  the  object  is  still  detectable 
if  the  threshold  is  properly  adjusted.  With  the  assumption  of  the 
optimal  quantization  for  all  levels,  the  probability  of  correct  object 
recognition  increases  exponentially.  Let  and  Pcu  t>e  the  probability 
of  correct  recognition  for  binary  and  unquantized  pictures  respectively, 
the  probability  of  correct  recognition  can  be  written  as 


„  . ,  ,,  ~k£  A  -2K 
Pc  =  Pc2  (1  -  e  +  6  > 


(2) 


where  i,  is  the  number  of  quantization  levels  and  k  is  determined  from 

P  =  P  (1  +  e  2k) . 
cu  c2 


.  Statistical  Feature  Extraction 

Features  extracted  from  the  histograms,  edges  and  textures,  are 
employed  in  the  binary  decision  tree  classifier  for  object  detection. 
The  preprocessing  plays  an  important  role  in  extraction  of  effective 


features. 


3.  statistical  Contextual  Analysis 


Image  models  take  into  account  the  contextual  information  from 
nearest  neighbors.  Compound  decision  theory  requires  the  knowledge 
of  probability  density  which  is  usually  unavailable  in  images. 

4.  Nonparametric  Learning 

The  nonparametric  approach  is  most  suitable  for  image  analysis 
as  the  probability  densities  are  usually  unknown.  Effort  is  made  to 
examine  the  learning  algorithms  using  nonparametric  procedures.  The 
results  will  be  reported  in  the  IEEE  1980  International  Conference 
on  Cybernetics  and  Society  (Attachment  I). 

III.  STATISTICAL  IMAGE  MODELS 

For  many  images  in  practical  applications,  statistical  information  is 
most  important.  Statistical  image  modelling  provides  a  good  approximation 
to  image  characterization  and  simplifies  many  image  processing  tasks 
(Technical  Report  EE-TR-79-6).  The  autoregressive  moving-average  (ARMA) 
model  is  particularly  suitable  for  image  analysis  and  for  enhancement  of 
noisy  images.  The  model  takes  into  account  for  each  pixel  the  gray  levels 
of  its  finite  number  of  nearest  neighbors. 

Recently,  we  consider  a  two-dimensional  ARMA  model  which  is  described 
as  follows.  Assume  that  the  image  is  a  sample  from  a  two-dimensional  homo¬ 
geneous  random  field  with  the  autocovariance  function, 

Rxx(i'j)  =  \  exp(-ci|i|  -  C2 | j | } ,  Cr  C2  >  0 

-  3  - 


(3) 


where  a*  is  the  variance  of  the  images,  and  i  and  j  are  the  spatial  incre¬ 
ments  in  the  vertical  and  horizontal  directions  respectively.  Also  assume 
that  the  observable  (noisy)  image  is  given  by 

y (t,s)  -  x(t,s)  +  v (t , s ) ,  t  ■  1, ,  N>  s  -  ly,M  (4) 

where  y(t,s)  is  the  noisy  observable  image  at  (t,s)  and  v(t,s)  is  a  Gaus¬ 
sian  white  noise  field  with  mean  zero  and  variance  y.  Our  objective  is 
to  restore  {x(t,s)>  based  on  the  noisy  observations  {y(t,s)J.  The  two- 
dimensional  ARMA  model  considered  is 

y (t+1 ,  s+1)  =  a/y(t,s+l)  +  a2  y(t+l,s)  -  a^  y(t,s) 

+  v(t+l,  s+1)  +  (K-l)  [a,v(t,  s+1)  +  a2  v(t+l,s) 

-  a1  a2  v(t,s)J  (5) 

where  K  is  the  stationary  gain.  Equation  (5)  represents  a  separable  two- 

dimensional  system.  Define  the  sample  variance  of  v(t,s)  as 

N  M 

M6)  l  V  (t'S)  (6) 

t=l  S=1 

where  0  =  (a  ,  a2»  K) .  We  identify  the  parameters  a  ,  a2»  and  K  such  that 

A(0)  is  minimized.  The  method  of  Steepest  Descent  is  used  in  the  optimiza- 

2 

tion  process.  The  minimum  value  of  A (6)  is  the  estimate  of  the  variance  o 
of  the  prediction  error  v(t,s).  For  a  fixed  6  and  given  observable  image 
(y(t,s)}  ,  the  prediction  error  v(t,s)  can  be  computed  from  Equation  (5) 
at  each  point.  When  the  optimal  values  of  a. ,  a2  and  K  are  obtained,  we 

A 

can  compute  the  predicted  image  by  using  the  expression  x  (t,s)  *  y(t,s)  -  v(t,s) 
For  simulation  study,  Figure  la  shows  a  noisy  image  of  size  150  x  150  with 
histogram  given  by  Figure  lb.  The  signal-to-noise  ratio  is  1.73.  The 
restored  image  after  6  iterations  is  shown  in  Figure  2a  with  initial  para¬ 


meter  values  a^  =  0.8,  a2  =  0.9  and  K  =  0.1.  Each  iteration  takes  about 


20  minutes  at  the  PDP  Ilf  45  minicomputer.  Figure  2b  is  the  histogram  of 
the  restored  image  which  shows  a  significant  improvement  over  Figure  lb. 
The  computer  program  of  the  ARMA  model  is  listed  in  the  Appendix. 

COMPARATIVE  EVALUATION  OF  IMAGE  PROCESSING  TECHNIQUES 

A  critical  comparison  of  the  median  filtering, Autoregressive  (AR) , 
ARMA  system,  and  Kalman  filtering  operations  was  reported  in  the  Techni¬ 
cal  Report  EE-TR-79-8.  The  Kalman  filtering  performs  the  best  in  images 
while  requiring  relatively  less  computation  time.  Figure  3a  is  the  origi¬ 
nal  picture  with  size  300  x  400  of  a  real  image.  The  results  of  median 
filtering  (3x3  window)  followed  by  Robert's  cross  gradient  is  shown  in 
Figure  3b.  Additive  Gaussian  noise  is  added  to  Figure  3a  with  signal-to- 
noise  ratio  of  1.73.  By  using  Kalman  filtering  one  scan  line  at  a  time 
horizontally  and  then  vertically,  the  restored  image  is  shown  in  Figure 
4a  with  corresponding  histogram  given  by  Figure  4b.  The  two  peaks  cor¬ 
responding  to  background  and  objects  are  more  pronounced  than  the  histo¬ 
gram  (not  shown)  of  the  original  picture.  Kalman  filtering  requires  much 
less  computation  time  than  the  two-dimensional  ARMA  model  described  in  the 
previous  section.  For  the  picture  size  of  300  x  400  which  is  more  than 
four  times  the  size  150  x  150  considered  in  the  previous  section,  Kalman 
filtering  for  both  horizontal  and  vertical  processing  requires  about  27 
minutes.  In  all  software  implementation  performed,  sequential  processing 
is  used  even  though  parallel  and  pipeline  operations  may  reduce  consider¬ 
ably  the  computation  time.  The  ARMA  model  described  in  the  previous  sec¬ 
tion  requires  considerably  more  storage  space  than  the  Kalman  filtering. 


-  5  - 


PUBLICATIONS  AND  CURRENT  RESEARCH 


Conference  papers  published  thus  far  under  the  contract  are: 

1.  C.  H.  Chen,  "Statistical  Image  Processing  and  Recognition," 
CCMPSAC,  November  1979  (see  Attachment  IX) 

2.  C.  H.  Chen,  "Applications  of  Statistical  Pattern  Recognition," 
presented  at  the  Joint  Statistical  Meeting  in  Washington,  D.C. 
August  13-16,  1979.  Full  paper  is  published  by  American  Statis¬ 
tical  Association  in  January  1980. 

Technical  Reports  prepared  under  the  contract  are: 

1.  C.  H.  Chen,  "Statistical  Image  Modelling,"  SMU-EE-TR-79-6, 
October  25,  1979.  To  be  presented  at  the  feNAR  Joint  Statis¬ 
tical  Meetings  in  Charleston,  SC  in  March  1980. 

2.  C.  H.  Chen,  and  Jenshiun  Chen,  "A  Comparison  of  Image  Enhance¬ 
ment  Techniques,"  SMU-EE-TR-79-8,  December  26,  1979.  Submitted 
for  publication. 

Current  research  is  based  on  the  topics  described  in  the  original 
proposal  with  emphasis  on  ARMA  models  and  statistical  image  recognition. 

In  addition  to  the  principal  investigator  there  are  three  graduate  students 
participating  in  the  research.  One  graduate  research  assistant  deals  with 
computer  study  of  image  models  and  processing  techniques.  Two  graduate 
teaching/research  assistants  deal  with  Infrared  image  analysis,  and 
statistical/structural  image  processing  respectively. 
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Appendix  ARMA  Model  Computer  Program 


BYTE  B(402)/  C( 150/  30 > 

REAL  Y ( 1 50  ) 

INTEGER  OFN. OUTXC.  CNT 
DOUBLE  PRECISION  SUN 

READ ( 6/ 80) IFN. OFN. IXI. IXF. IYI.  IVF.  II.  12. S.  FAC 
FORMAT  (01 5.  5F10.  4> 

IYL-IYF-IVI+1 

IXL-IXF-XXI+1 

CNT-0 

SUM-0 

DO  10  I-IVI.  IVF.  30 

DEFINE  FILE  IFN< 301 ;  201 .  U.  INXC) 

DO  20  J-l.  30 
INXC-I+J-1 
READ<  IFN-'  INXC)B 
KL-1 

DO  30  K-IXI.  IXF 
C(KL. d)-B<K> 

KL-KL+1 
CONTINUE 
END  FILE  IFN 

DEFINE  FILE  OFN (450.  300.  U.  OUTXC) 

DO  40  J-l. 30 

DO  50  K-l.  I  XL 

CALL  BY2IN(C(K.  J).  NT) 

CALL  GAUSS(T.  S.  II.  12) 

Y (K) -FLOAT ( NT >+T 
SUM—SUM+Y ( K ) 

OUTXC— CNT + J 

WR I TE ( OFN ' OUTXC ) Y 

CONTINUE 

END  FILE  OFN 

CNT- CNT +30 

CONTINUE 

DEFINE  FILE  OFN ( 450.  300. U. OUTXC) 

XMEAN— SUM/ ( ( IYF-IYI+1 )*( IXF-IXI+1 ) ) 

DO  60  1-1  IVL 
OUTXC— I 

RE AD ( OFN ' OUTXC ) Y 

DO  70  J-l. I XL 

V(J)  —  <Y(J ) —XMEAN ) /FAC 

OUTXC— I 

WRI TE< OFN  OUTXC >Y 
CONTINUE 

WRITE (6.  62) XMEAN 

FORMAT ( "  MEAN  -  '.F12.  5) 

CALL  BELL 
CALL. EX IT 
END 

SUBROUTINE  OAUSS(T.  S.  II. 12) 

T— 0.  0 

DO  lO  1-1. 48 
T-T+RAN( II. 12) 

T-r-24.  0 
T— T  #0.  3 
T— T*S 
RETURN 
END 


REAL  Q!3>.  D!3) 

COMMUN  INDEX.  NFU 

READ16. 5)0.  AL.  IXI.  IXF.  XVI.  IYF,  ITR.  NDEV. NSF 
3  FORMAT (4F12.  S.  1015) 

NFU-3 

DEFINE  FILE  NFU <450,  300.  U,  INDEX) 

IXL— IXF— IXI+1 
IYL-IYF-IYI+1 
DO  20  1-1 , ITR 

CALL  GQ!Q> D.  I XL,  I YL.  EF»  NSF ) 

DO  30  J-l,  3 

30  Q< J)*Q! J)-AL#D( J) 

WRITE! NDEV.  35  >  I .  Q.  0.  EF 

35  FORMAT!  IX.  14,  7F10.  4) 

20  CONTINUE 

70  CALL  BELL 

CALL  EXIT 
END 

SUBROUTINE  GQ<Q. D.  IXL.  IYL.EF,  NSF) 

REAL  Q!  1  >.  Di  1  > 

REAL  K. Y ! 150  > ,  PY! 150) . V! 150) .  PV( 150) , LMD! 150) . PLMD! 150) 

DOUBLE  PRECISION  LAI ,  LA2,  LK,  DEF 

COMMON  INDEX, NFU 

IP-IYL-1 

JP=I XL— 1 

IVB-150 

I LB-300 

LA  1*0. 

LA2-0. 

LK=0. 

DEF*0. 

V!  1  )*0. 

LMD !  I  XL )  *0. 

A1 *0! 1 ) 

A2-Q !  2 ) 

K=G!3> 

DO  5  1*1. IXL 
PV!  I  )-0. 

5  PLMD !  I  )  *0. 

INDEX-l+IVB 
WRITE! NFU ' I NDEX ) P V 
INDEX*I YL+ILB 
WRITE! NFU' INDEX) PLMD 
I NDEX* 1 

READ ! NFU' INDEX )PY 
IF!  NSF.  EQ.  1  )WRITE!5,  41  )PY 
41  FORMAT!  IX,  10F10.  4) 

DO  10  1*2,  IYL 
INDEX-I 

READ !NFU' INDEX >Y 
DO  20  J-2. IXL 

V! J)-Y! J)-Al#Y< J-l ) — A2*PY ( J ) +A1 #A2#PY ( J— 1 ) 

20  V! J)*V< J>-!K-1 )*(A1#V! J-l )+A2*PV! J)-A1#A2*PV! J-l ) ) 

INDEX* I +150 
WRITE!NFU-' INDEX  )V 
DO  30  J-l.  IXL 
30  PY ! J)*Y ! J) 

DO  35  J-2.  IXL 
35  PV ( J  > "V  <  J ) 

IF! NSF.  EQ  1  )WRITE!5,  41  )Y,  V 
10  CONTINUE 


DO  40  ISC-2. 1YL 
I— I VL— ISC+1 
I NDE  X — I  + 1 VB 
READ (NFU' INDEX >V 
DO  50  JSC-2. I XL 
J-IXL-JSC+1 

50  LMD(J)«(l-K)*<Al#LMD<J*l )+A2*PLMD( J)-A1*A2#PLMD( J+l ) )-2#V(J> 

DO  73  J-l. JP 
73  PLMDl  J)-LMD(  J> 

INDEX* I  +  ILB 
WR I TE  ( NFU -  I NDE  X ) LMD 
40  CONTINUE 

INDEX-1 

READ (NFU' I NDEX >  Py 
INDEX-1 +IVB 
RE AD ( NFU ' I NDE  X ) PV 
DO  130  1-2, IP 
INDEX— I 

READ (NFU' INDEX »Y 

INDEX— I-*  I VB 

READ ( NFU ' I NDE  X ) V 

INDEX-I+ILB 

READ ( NFU  I NDE X ) LMD 

DO  70  J— 2.  JP 

LA 1 —LA 1 +LMD (  J  )  #  (  Y  ( J— 1 )-A2*PY( J-l )+<K-l )*< V< J-l )-A2#PVC J-l ) > ) 
LA2— LA2+LMD (  J )  *  ( PY ( J  )  — A1  *PY (  J—  1  l-MK-l  )*<PV< J)-A1#PV< J-l  >  )  > 
LK-LK+LMD ( J )  #  ( A 1 #V ( J— 1 ) ♦A2*PV ( J ) — A1 *A2*PV ( J— 1 > ) 

DEF=UEF+ V ( J ) * V ( J ) 

70  CONTINUE 

DO  30  J-l. I XL 
80  PY ( J )— Y ( J ) 

DO  85  J— 2, I XL 
85  PV(J)-V(J) 

130  CONTINUE 

XY=( IP-1 )*( JP-1 ) 

D( 1 >— LA1/XV 
D( 2 )— LA2/XY 
D(3>— LK/XY 
EF-DEF/XY 
RETURN 
END 


REAL  HX ( 256 ) »  HY ( 256 > »  F< 150) 

INTEGER  XMI.  XEXT. YMI,  YEXT 
COMMON  INDEX, NFU 

READ (6,  20 ) TH, RMEAN, FAC.  GAIN.  IXI.  IXF,  IYI.  IYF. NSF. NFU. NFC 

20  FORMAT  (4F10.  5.  2015) 

READ (6.  21 >XMI,  XEXT, YMI. YEXT 

21  FORMAT (415) 

DEFINE  FILE  NFU (450. 300. U, INDEX) 

IVB-150 

XI— FLOAT ( IXI ) 

XF=FLOAT( IXF) 

YI-FLOAT( IYI ) 

YF -FLOAT (IYF) 

I  XL— I XF-I X 1  +  1 
IYL-IVF— IYI+1 
XL-XF-XI 
VL-YF-YI 
CALL  INI TT (O ) 

CALL  VWINDO(  1.  .  XL.  1.  ,  VL) 

CALL  SWINDO( XMI. XEXT.  YMI.  YEXT) 


*•* 


NF-IXL 

IF  ( NSF.  EQ.  -1.  OR.  NSF  EO.  1  )00  TO  229 
190  00  90  1-1.256 

HY(I  )-0.  0 
HX ( I > “FLOAT ( I ) 

90  CONTINUE 

225  INDEX-1 

DO  100  I-l.  IYL 
INDEX— I 

READ <NFU' INDEX >F 
IF (NSF) 120#  110.  119 

119  CALL  IDA(I.F.  NF.  IVB.  NFC,  GAIN) 

INDEX- I 

IF ( NSF.  EQ.  3)WRITE(NFU' INDEX )F 
IF  (NSF.  EQ.  3 >00  TO  100 
IF  (NSF.  GT.  1  >60  TO  110 

1 20  PPY-YL -FLOAT ( I ) +2 

CALL  DSPLY(F.  PPY.  NF,  TH) 

GO  TO  lOO 

110  CALL  HISTQG(F. HY, NF. RMEAN, FAC) 

100  CONTINUE 

CALL  BELL 

CALL  FINITT(0,  780) 

IF ( NSF.  EQ.  0.  OR.  NSF.  EQ.  2>CALL  KBPLOT(HX.  HY.  256,  0,  767,  10.  600) 

CALL  EXIT 

END 

SUBROUTINE  IDA( I.  F.  NF,  IVB,  NFC.  GAIN) 

REAL  F ( 1 ) ,  V ( 1 50 ) 

COMMON  INDEX. NFU 
INDEX^I+IVB 
READ (NFU' INDEX )V 
DO  10  J-l,  NF 
F( J)»F(J)-V(J> 

IF ( NFC.  EQ.  1  >F(J)«F(J)+GAIN#V<J> 

10  CONTINUE 

RETURN 
END 

SUBROUTINE  DSPLVfF,  Y.  NF.  TW> 

REAL  F ( 1 ) 

DO  10  I-l.NF 

IF(F(  I ).  LT.  TH)GO  TO  10 

FF I -FLOAT ( I ) 

CALL  POINTA(FFI. V) 

10  CONTINUE 

RETURN 
END 

SUBROUTINE  HISTOO(F.  HY.  I XL.  RMEAN.  FAC) 

REAL  HY(1).F(1) 

DO  5  I-l. I XL 

F(I)-F(I) #FAC+RHEAN 

IF (F  ( I ).  GT.  256.  >F(I)-2S6. 

IF(  F(  I ).  LT.  1.  )F<  I )— 1. 

5  CONTINUE 

DO  10  I-l. I XL 
J-INT (F( I ) > 

H  Y  ( J )  — H  Y  (  J  )  ♦  1 .  O 
lO  CONTINUE 

RETURN 
END 
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LEARNING  IN  STATISTICAL  PATTERN  RECOGNITION 
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ABSTRACT 


Machine  learning  has  been  an  area  of  active  research  interest  in  pattern 
recognition  and  cybernetics.  A  recent  article  by  Mr.  Shimura  that  appeared  in 
the  Proceedings  of  the  4IJCPR  provided  an  excellent  survey  and  detailed  biblio¬ 
graphy  on  various  learning  algorithms  for  pattern  classification.  In  this  paper 
which  is  primarily  tutorial  in  nature,  both  the  fundamental  issues  and  recent 
development  of  learning  in  statistical  pattern  recognition  are  discussed  in 
detail.  Following  an  extensive  review  of  recent  progress  on  parametric  learning 
that  includes  Bayes  and  maximum  likelihood  procedures,  and  the  nonparametric 
learning  particularly  with  the  nonparametric  probability  density  estimation, 
the  interrelationships  among  learning  algorithms  using  adaptive  digital  filtering, 
Kalman  filtering,  stochastic  approximation  and  autoregressive  modeling  are 
examined.  Learning  algorithms  for  contextual  analysis  in  imagery  recognition 
are  developed.  Furthermore,  the  basic  relationships  among  learning  sample  size, 
error  performance  and  the  feature  number  are  considered.  Examples  are  drawn 
from  both  waveform  and  imagery  recognition  studies. 
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Abstract 

This  paper  is  concerned  with  the  methodologies 
in  statistical  image  processing  and  recognition. 
Specific  areas  considered  are  the  following:  (1) 

The  decision  rules  in  image  recognition  and  their 
comparative  evaluation  under  finite  sample  size 
condition;  (2)  Statistical  feature  extraction  tech¬ 
niques  for  image  segmentation  with  emphasis  on  the 
statistical  characteristic  of  textural  features; 

(3)  Statistical  contextual  analysis  algorithms  for 
images.  Emphasis  is  placed  on  the  contextual  pre¬ 
processing/postprocessing  techniques  to  implement 
the  optimum  decision  rules  with  context;  (4)  Sta¬ 
tistical  image  modelling  techniques  including  the 
nonhomogeneous  models  and  the  autoregressive  models. 
The  software  problems  involved  in  these  areas  are 
also  examined  in  details. 

I,  Introduction 

There  has  been  strong  demand  for  realtime  or 
near  realtime  operations  with  images  in  practical 
applications.  The  statistical  information  is  most 
Important  in  the  imagery  data  in  many  such  appli¬ 
cations.  In  the  past  two  decades,  the  areas  of 
statistical  pattern  recognition  and  image  process¬ 
ing  were  developed  simultaneously  but  almost  inde¬ 
pendently.  Thus  efforts  should  now  be  made  to  con¬ 
solidate  the  researches  in  both  areas  to  meet  the 
unusual  requirements  and  the  increasingly  complex 
nature  of  imagery  data  in  practical  applications. 

This  paper  is  semi-tutorial  in  nature  and  it  covers 
the  major  methodologies  and  associated  software 
implementation  problems  in  the  topics  of  decision 
rules,  feature  extraction  contextual  analysis,  and 
image  models.  New  results  include  the  adaptive 
Kalman  filtering  for  image  enhancement. 

Every  image  analysis  algorithm  should  be 
developed  with  software /hardware  implementation  in 
mind.  In  general,  local  operations  are  faster  but 
utilizes  less  contextual  analysis.  Global  opera¬ 
tions  requires  much  more  computation.  Both  soft¬ 
ware  and  hardware  developments  will  be  extremely 
Important  in  determining  the  future  progress  in 
image  processing  and  recognition. 

II.  Comparative  Evaluation  of  Decision  Rules 

Classification  is  an  Important  step  in  statls-  . 
deal  Information  processing  in  general.  Choice  of 
classification  (decision)  rules  may  make  a  lot  of 
difference  in  the  recognition  performance.  The 


comparative  performance  evaluation  of  statistical 
classification  rules  is  a  fundamental  but  unsolved 
problem  [1]  [2].  For  real-time  operation,  the 
images  may  be  received  at  a  high  data  rate  such 
that  each  image  has  to  be  processed  within  a  very 
limited  time  interval.  As  a  result  a  limited  amount 
of  measurements  is  used  in  actual  processing.  Simi¬ 
larly  in  on-board  processing  both  time  and  equipments 
are  limited.  The  problem  of  finite  sample  size  thus 
becomes  important.  For  imagery  data  the  finite 
sample  constraints  includes  not  only  the  limited 
number  of  learning  samples  but  also  the  finite  num¬ 
ber  of  quantization  levels.  Most  research  on  sta¬ 
tistical  pattern  recognition  has  been  based  on  the 
assumption  of  known  parameters  or  the  availability 
of  large  or  infinite  number  of  samples  to  estimate 
the  parameters  or  probability  distributions.  Under 
the  finite  sample  constraints,  these  assumptions  are 
not  valid  and  the  behaviors  of  decision  rules  must 
be  re-examined. 

Typical  decision  rules  employed  in  image  recog¬ 
nition  are  the  maximum  likelihood  decision  rule 
(NNDR),  Fisher's  linear  discriminant,  the  sequential 
decision  procedure,  and  the  decision  tree  schemes. 

The  MLDR  is  optimum  in  the  sense  of  minimizing  the 
error  probability  with  respect  to  given  a  priori 
probability  distributions.  Under  limited  sample 
size,  statistical  parameters  estimated  may  be  highly 
Inaccurate.  Furthermore,  the  assumption  of  statis¬ 
tical  distribution  itself  may  not  be  valid.  The  use 
of  nearest  neighbor  decision  rule  is  a  logical 
choice  under  these  circumstances.  Asymptotically, 
the  error  rate  of  NNDR  is  upper  bounded  by  twice  of 
the  Bayes  error.  However,  the  required  computation 
in  NNDR  is  considerably  more  and  the  actual  perfor¬ 
mance  can  be  worse  than  that  of  the  MLDR.  Many 
researchers  have  reported  better  performance  with 
some  modifications  of  the  MLDR  and  NNDR  for  the  real 
data  [3].  It  is  believed  that  the  finite  sample 
constraint  is  mainly  responsible  for  the  deviation 
of  actual  performance  from  theoretical  performance. 
Recent  work  on  the  finite  sample  decision  rules  has 
been  reported  by  Chen,  Pau  and  Kittler  [4,5,6] .  An 
earlier  study  [7]  has  shown  that  under  moderate 
sample  size  and  Gaussian  distribution,  the  perform¬ 
ances  of  the  MLDR  and  NNDR  are  comparable.  Standard 
software  is  available  to  implement  the  MLDR. 

Computationally  much  work  has  been  done  for  the 
NNDR  to  reduce  the  number  of  distance  calculations 
to  find  the  nearest  neighbors.  Preprocessing  is 
needed  but  inexpensive  in  all  algorithms.  Efficient 
software  implementation  of  the  NNDR,  however,  is 
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a  challenging  problem.  By  ordering  the  samples 
according  to  projections,  Friedman  et  al.  [8]  deve¬ 
loped  an  algorithm  that  finds  the  k  nearest  neigh¬ 
bors  of  a  point,  from  a  sample  of  size  N  in  a  d  - 
dimensional  space,  with  the  expected  number  of  dis¬ 
tance  calculations  given  by 

E[ndl  <  n  _i4  [kd  r(d/2)J1/d  (2N)1~(1/d)  (1) 

Eq.  1  is  the  only  analytical  expression  available 
for  computation  complexity.  The  brute  force  method 
would  require  Nd  distance  calculations.  The  amount 
of  computational  saving  by  the  algorithm  depends  on 
k,  d,  and  N.  As  a  typical  example  of  k  =  1,  d  ”  2, 

N  =  100  and  Gaussian  sample,  the  algorithm  requires 
only  10%  of  distance  calculations  for  the  brute 
force  method.  Another  algorithm  uses  the  branch 
and  bound  method  [9].  The  amount  of  saving  is  even 
more  dramatic  even  though  more  preprocessing  is  re¬ 
quired.  In  the  maximum  likelihood  decision  rule, 
consider  again  the  Gaussian  case,  the  equivalent 
number  of  calculations  in  computing  the  covariance 
matrix  is  also  Nd.  The  number  of  calculations  for 
the  quadratic  form  that  appears  in  the  exponent  is 
bound  by  d4.  Thus  computationally  the  NNDR  can  be 
made  more  attractive  than  the  MLDR  and  thus  more 
suitable  for  realtime  operations.  However,  the 
memory  or  storage  requirement  definitely  is  in 
favor  of  the  MLDR. 

Among  other  decision  rules,  the  Fisher  linear 
discriminant  is  very  effective  if  the  second  order 
statistics  is  sufficient  for  the  data.  The  sequen¬ 
tial  decision  procedures  [10]  are  useful  when  the 
cost  of  measurements  is  significant  enough  to  take 
into  account  in  decision  making.  The  decision  tree 
classifier  is  most  promising  for  realtime  operation 
needs  with  the  imagery  data.  With  a  pre-designed 
linear  binary  tree  classifier,  the  overall  computa¬ 
tion  time  can  be  less  than  ten  percent  of  that  of  a 
single  stage  classifier  [11].  The  combined  feature 
selection  and  tree  classifier  design  approach  [12], 
[13]  appears  to  be  most  promising  for  imagery  recog¬ 
nition  in  terms  of  both  computation  and  performance. 
If  tree  search  methods  are  used,  the  software  imple¬ 
mentation  will  be  more  complicated.  Theoretical 
evaluation  of  tree  classifiers  is  generally  diffi¬ 
cult.  If  the  memory  capacity  is  not  a  major  con¬ 
straint,  the  table  look-up  approach  [14]  has  been 
considered  for  the  implementation  of  MLDR.  It  is 
also  noted  that  the  "peaking"  phenomenon  exits 
among  sample  size,  dimensionality  and  error  proba¬ 
bility.  It  is  not  clear  which  decision  rule  is 
least  sensitive  to  such  phenomenon. 

III.  Statistical  Feature  Extraction 


In  statistical  pattern  recognition,  features 
are  usually  extracted  by  evaluating  the  distance  or 
information  measures.  When  the  sample  size  is 
limited,  errors  result  in  the  computation  of  these 
measures  [4].  For  image  classification  and  segmen¬ 
tation,  textural  features  derived  from  the  histogram 
and  co-occurrence  matrices  of  gray  levels  are  the 
most  useful  features  (see  e.g.  [15,  16,  17]).  There 
are  at  least  thirty  textural  features  proposed  so 
far  that  take  into  account  the  coarseness,  contrast, 
directionality,  line-likeness,  regularity,  roughness 


and  other  properties.  Information  provided  by  gray¬ 
scale  histograms  alone  is  usually  not  enough  for 
classification  or  segmentation.  Computation  of  co¬ 
occurrence  matrices  however  is  time  consuming.  Some 
preprocessing  operations  such  as  the  spatial  diffe¬ 
rentiations  (e.g.  gradient  and  mof idled  gradient 

[18] )  and  histogram  manipulation  may  precede  the 
textural  feature  extraction  and  lead  to  more  effec¬ 
tive  features.  Statistical  analysis  of  preprocessed 
pictures  will  also  help  to  differentiate  different 
regions  in  a  picture,  which  have  different  statis¬ 
tical  characteristics  such  as  the  skewness,  kurto- 
sis,  and  biomodality  of  the  grayscale  histograms 

[19] ,  [20]. 

It  is  noted  that  preprocessing  for  feature 
extraction  is  usually  less  expensive  than  co-occur¬ 
rence  matrix  computation.  Some  second  order  statis¬ 
tical  property,  however,  is  desirable  or  even  neces¬ 
sary.  Software  development  is  needed  for  such  com¬ 
putation.  For  the  quantization  problem,  the  rela¬ 
tionship  among  the  number  of  quantization  levels, 
dimension  of  the  vector  measurement,  and  the  number 
of  samples  has  been  considered.  With  properly 
chosen  quantization,  a  small  number  of  quantization 
levels  can  still  be  very  effective.  The  binary 
threshold  picture  is  a  good  example.  As  the  local 
statistics  of  picture  elements  on  both  sides  of  an 
object  boundary  are  not  the  same,  effective  algo¬ 
rithms  can  be  developed  for  segmentation  guided  by 
statistical  principles.  For  image  classification 
statistical  features  can  be  effectively  extracted 
from  orthogonal  transforms  of  the  pictures.  Both 
global  and  local  properties  can  be  considered  in 
feature  extraction  [21], 


IV.  Statistical  Contextual  Analysis 

There  has  been  much  success  in  using  the  sta¬ 
tistical  contextual  information  in  character  recog¬ 
nition  [22].  In  image  recognition,  interpretation 
and  segmentation,  the  rich  contextural  information 
can  be  described  statistically.  There  is  no  doubt 
that  significant  improvement  over  the  existing  re¬ 
sults  will  be  available  if  the  statistical  contex¬ 
tual  Information  is  fully  utilized.  One  approach 
is  to  derive  statistical  models  for  the  image. 

This  is  the  subject  of  the  next  section.  A  formal 
statistical  approach  to  the  problem  is  the  compound 
decision  theory.  Suppose  an  image  is  partitioned 
into  a  number  of  subimages  (cells)  and  it  is  desired 
to  classify  each  cell.  The  assumption  of  dependence 
on  neighboring  cells  only  is  reasonable.  Let  xq  be 
the  vector  measurement  of  the  cell  under  considera¬ 
tion.  Then  the  compound  decision  rule  is  to  choose 
the  pattern  class  which  maximizes  [23], 


p(xo/wk)p(wk)^  p(xJ/wk) 


(2) 


for  8-neighbor  dependence.  Here  wk“l,2,...,m  and  m 
is  the  total  number  of  classes.  Implementation  of 
the  maximum  likelihood  decision  rules  given  by 
Eq.  (2)  is  straightforward  provided  that  the  prob¬ 
ability  densities  are  known.  This  information, 
however,  is  not  given.  It  is  noted  that  in  Eq.  (2) 
it  is  assumed  that  the  true  classes  of  neighbors  are 
also  unknown.  With  some  manipulation,  Eq.  (2)  can 
be  reduced  to  the  form  which  requires  the  transition 
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probabilities  among  the  neighboring  cells  [23]. 

This  problem  is  closely  related  to  the  probabilis¬ 
tic  scene  labeling  which  is  a  subject  of  scene 
analysis  [24], [25].  Alternatively,  some  preclassi¬ 
fication  can  be  performed.  The  result  will  assist 
the  machine  to  learn  the  probability  densities  for 
Eq.  (2).  After  classifications,  correctiorf  can  be 
made  about  the  knowledge  of  true  classes  of  neigh¬ 
boring  cells.  The  procedure  may  be  repeated  seve¬ 
ral  times  until  consistent  decisions  are  made  for 
each  cell.  This  contextual  post-processing  idea 
which  was  shown  to  be  very  powerful  in  character 
recognition  should  be  effective  also  in  statistical 
image  recognition. 

V.  Statistical  Image  Modelling 

Because  of  the  random  nature  of  imagery  data, 
there  have  been  several  attempts  to  model  the  image 
statistically.  Obviously  it  would  be  difficult  to 
full  characterize  a  real  image  by  a  single  model. 
However,  modelling  will  facilitate  computer  process¬ 
ing  and  will  be  necessary  for  real-time  operations. 
Modelling  takes  into  account  the  inter-pixel  depend¬ 
ence.  The  Markov  random  field  is  the  most  typical 
assumption  (e.g.  [26,27,28])  in  statistical  models. 
Because  of  the  object  boundaries  which  exist  in 
images  of  military  applications,  the  homogeneous 
random  field  assumption  is  not  appropriate.  For 
this  reason  the  image  scan  lines  can  be  modeled  as 
a  Markov  jump  process  [29]  which  leads  to  non¬ 
linear  noise  reduction.  The  image  can  also  be 
modeled  as  a  marked  point  process  evolving  accord¬ 
ing  to  a  spatial  parameter  [30],  In  another  ap¬ 
proach  the  image  is  considered  as  a  spatially  vari¬ 
ant  linear  system  superimposed  by  non-linear  ele¬ 
ments  corresponding  to  object  boundaries.  Two 
different  methods  have  been  proposed  [31][32],  to 
perform  recursive  filtering  of  noisy  images  for 
such  image  models. 

In  our  model  [32]  an  adaptive  Kalman  filtering 
method  is  used  for  image  enhancement  which  is  recur¬ 
sive  and  suitable  for  real-time  operation,  requiring 
little  parametric  information  of  the  image  model, 
and  adaptable  to  the  textural  and  temporal  varia¬ 
tions  in  the  image.  The  basic  idea  is  that  the 
Kalman  filter  is  Implemented  on  the  assumption  that 
there  are  no  state  jumps,  and  a  second  system  is 
designed  to  monitor  the  measurement  residuals  of 
the  filter  to  determine  if  a  change  has  occurred 
and  adjust  the  filter  accordingly.  In  picture  pro¬ 
cessing,  state  jumps  correspond  to  the  object  bound¬ 
aries.  In  order  to  detect  and  estimate  the  posi¬ 
tions  and  amplitude  of  the  possible  state  jumps,  a 
second  system  is  constructed  which  operates  in 
parallel  with  the  Kalman  filter.  The  result  of 
detection  and  estimation  is  fedback  to  the  Kalman 
filter  to  update  its  operations.  A  generalized 
likelihood  ratio  test  is  used  for  the  second  system 
with  the  assumption  that  all  the  relevant  densities 
are  Gaussian.  Typical  computer  result  as  illus¬ 
trated  in  Figure  1  shows  the  image  enhancement  by 
adaptive  Kalman  filtering  at  low  signal-to-nolse 
ratio.  The  improvement  is  very  significant  as  com¬ 
pared  with  the  modified  gradient  method  which  is  a 
local  operation.  And  the  computation  time  is  not 
much  larger. 


An  efficient  procedure  to  take  into  account 
the  local  dependence  is  the  statistical  theory  of 
nearest-neighbor  system  on  a  lattice  [33].  Let  r 
and  s  be  the  row  number  and  column  number  associated 
with  a  picture  element  (pixel)  x.  A  simultaneous 
model  is 


x  “  B,  (x  ,  +  x  . ,  )  +  A(x  ,+  x  ,) 

rs  1  r-l,s  r+l,s  2  r,s-l  r,s+l 


+  Y 


(3) 


where  i»l,2,..,  M;  J«1 ,2,...,  N  and  {Y  }  is  an  un- 
correlated  Gaussian  noise  process  withrsE[Y  ]  -  0, 
var.  {Y  }  »  o^2,  i«l,2,..,m.  That  is,  the  variance 
of  Y  differs  among  classes.  An  alternative  model 
considered  [34]  is 


xrs  -  “i  -  Bi  {  (xr-l,s  "  *i)+(xr,s-l  '  V} 


+  Y 


(4) 


which  may  be  written  as 

x  =  a,  +  8,  [x  .  +  x  ,  ]  +  Y 

rs  i  i  r-l,s  r,s-lJ  r 


(5) 


the  parameters  (a  and  6)  can  be  determined  from  the 
least  squares  estimates  [34].  The  probability  den¬ 
sity  of  the  observations  (x  }  of  the  image  for  a 
given  class  may  follow  the  Eiussian  density.  Maxi¬ 
mum  likelihood  decision  rules  can  then  be  used  for 
classification.  For  the  model  represented  by  Eq. 
(3),  the  parameters  (B^,  B,)  can  also  be  estimated 
from  the  data  even  though  this  is  not  a  simple 
least  squares  problem.  It  is  important  to  note 
that  the  first  order  linear  autoregressive  models 
given  above  can  be  easily  extened  to  higher  order 
(such  as  the  second  order)  dependence.  The  para¬ 
meter  B  represents  spatial  correlation  among  pixels. 


Both  the  nonparametric  adaptive  filtering  met¬ 
hod  and  the  parametric  autoregressive  method  are 
very  powerful  and  general  image  analysis  approaches 
which  can  be  efficiently  implemented  by  software. 
However,  the  limitation  is  also  evident.  The 
structural  properties  of  imagery  patterns  are  not 
considered  at  all.  It  is  generally  difficult  to 
incorporate  any  structural  information  in  the  models 
above.  It  is  necessary  to  make  sure  that  the  assump¬ 
tion  is  at  least  approximately  correct  in  using  the 
models. 

As  a  concluding  remark,  the  statistical  image 
processing  and  recognition  presents  many  challenging 
problems  to  software  engineers.  Future  software 
development  certainly  will  be  very  helpful  to  the 
progress  in  this  area. 
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Figure  la  -  Typical  Aircraft  Image  With  Signal-to 
Noise  Ratio  of  1.8 


Figure  lb  -  Adaptively  Filtered  Image  Correspond 
ing  co  Figure  la 
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